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ABSTRACT 


For many government departments, uncertainty aversion is a source of barriers in the advancement of data 
openness. A more active response to potential risks is needed and necessitates an in-depth examination of 
risks related to open government data (OGD). With a cross-case study in which three cases from the United 
Kingdom, the United States and China are examined, this study identifies potential risks that might emerge 
at different stages of the lifecycle of OGD programs and constructs a taxonomy model for them. The taxonomy 
model distinguishes the “risks from OGD” from the “risks to OGD”, which can help government departments 
make better responses. Finally, risk response strategies are suggested based on the research results. 


1. INTRODUCTION 


Government information disclosure has played an important role in the democratic development of 
human society since Sweden passed the Freedom of the Press Act in 1766. In particular, since the United 
States issued the Freedom of Information Act (FOIA) in 1966, many countries have passed laws or regulations 
to protect the “right to know” of the public. With the advent of the era of big data, significant attention has 
been focused on the value of open government data (OGD) in promoting government transparency and 
accountability, public participation and social innovation [1, 2, 3, 4]. More than 70 countries have joined 
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the “Open Government Partnership” Program till 2016 [5]. In China, 46 Chinese local governments had 
launched OGD websites as of May of 2018 [6]. In January 2019, the OPEN Government Data Act of the 
United States was signed into law [7]. 


Nevertheless, along with the worldwide advance of OGD, various barriers have been reported by OGD- 
leading countries such as the UK [8], the Netherlands [9], and the USA [10]. A number of studies have 
explored various barriers to OGD. We categorize them into the following six classes according to their 
perspectives on OGD: (1) the data user perspective, e.g., lack of knowledge of access to the OGD data 
sets [11, 12]; (2) the data provider perspective, e.g., institutional barriers [13], basic resources, organizational 
arrangement and technical capacity [14], fear of false conclusions [9], and economic issues [15]; (3) the 
data perspective, e.g., fragmented data sets [16], poorly documented metadata [17], poor data quality [13, 
18], poor information usability [11], poor machine-processability [19, 20], and complex data formats [21]; 
(4) the legislation perspective [13], e.g., difficulties in evaluating the eligibility of a data set [11], difficulties 
in evaluating the privacy sensitivity of a data set [10, 22], and the complexity of data copyright [23]; (5) 
the technology perspective [10, 13], e.g., the way in which the data are stored, obtained and used by a 
department [24]; and (6) the environment perspective, e.g., external pressures [25]. 


Some of these barriers exist in reality, while others simply derive from an insufficient understanding 
of the possible risks. It was reported that a database of registered usernames and email addresses of the 
data.gov.uk had been leaked [26]. In 2018 BBC reported that sensitive information of the military has been 
disclosed in a data visualization map of a fitness tracking company [27]. Besides, some people were also 
concerned that the geospatial data published on data.gov by the Department of Agriculture of the USA 
might be used to locate crops targeted for eradication via infestation, or even to commit acts of biological 
warfare [28]. As shown in above cases, the fear of the potential risks may hinder the advance of OGD. The 
culture of risk aversion is one source of existing institutional barriers to OGD [13]. Uncertainty avoidance 
plays a negative moderating role in the relationship between other organization resources and the OGD 
capacity of government departments [14]. 


The best strategy for addressing risks is understanding and managing them more effectively, rather than 
ignoring or avoiding them. Therefore, we aim to address the following three questions: “What kinds of risks 
are involved when government departments implement OGD initiatives?”, “How are these risks distributed 
over the lifecycle of OGD?” and “What strategies could be adopted?” To answer these questions, we 
conducted a cross-case study on three OGD-related cases from three countries, the cease of the care.data 
program in the UK, the IRS data breach in the USA and the tardy progress of the OGD program in China. 
We then identified 14 risks and categorized them into a taxonomy model which distinguishes “risks to 
OGD” from “risks from OGD". This study deepens the current understanding of OGD-related risks and 
brings new insights into the mechanism design for advancing OGD. 
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2. LITERATURE REVIEW AND ANALYTICAL FRAMEWORK 
2.1 Risks Associated with OGD 


Risk refers to the uncertainty of future results in a given condition and a particular period [29]. The risks 
related to OGD that are frequently discussed are those pertaining to data leakage [30, 31, 32], invasion of 
privacy [33, 34, 35] and other information security issues. [36] summarized 11 government risks in data 
release: copyright, trade secret protection, privacy, the security of the infrastructure, publication of improper 
data or information that might lead to negative attitude towards public institutions, inaccurate data, 
misinterpretation of the data, absence of data consumers, less willing to cooperate, overlapping of data and 
increased number of requests for data. The Office for National Statistics of UK also proposed to better 
balance openness with privacy protection [37]. [38] believes that the risks posed to the official statistics 
department by big data are also related to mission drift, damage to reputation and the loss of public trust, 
inconsistent access and continuity, the fragmentation of approaches across jurisdictions, resource constraints 
and cut-backs, privatization and competition, etc. Besides, OGD are also vulnerable to risks in terms of 
effectiveness, relevance and trust [39]. [15] categorized the risks that may hinder OGD as those related to 
governance, economic issues, licenses and legal frameworks, data characteristics, metadata, access and 
skills. Based on a single case study of Shanghai, [40] summarized the potential risks of OGD from the levels 
of legislation, management and data. 


2.2 Risk Management Related to OGD 


In response to risks faced by government departments, the National Audit Office (NAO) of the UK 
proposed embedding risk management into their core decision-making and planning management processes. 
The NAO put forth the notion that risk management, including identifying, evaluating, processing and 
reporting, can help governments make credible decisions and support innovation [41]. In 2013, a subordinate 
of the US Treadway Commission released the Internal Control-Integrated Framework to address enterprise 
risk management in five aspects: controlled environment, risk assessment, control activities, information 
and communication, and regulatory activities [42]. 


[36] proposed mitigation strategies for the identified risks of OGD, including monitoring and assessment 
of the demand for data, proper specification of data sets to avoid duplication, compliance assessment, data 
anonymization and data aggregation, quality control of data publication, establishment of internal and 
external data catalogue, linking to data sets already published, properly formulated terms and prompts for 
data originating from third parties, clearly explained duties, and continuous monitoring of the impacts of 
OGD initiatives. [43] holds an opinion that the application of tax-related data quality control and data 
technology is a key element in control of tax risks. In 2017, the British government updated the Data 
Protection Act promulgated in 1998 to reinforce the rights of citizens, such as the right to be forgotten, 
personal data, privacy information and data migration stipulating that identifying personal information from 
anonymous data and tampering with them will result in criminal charges [44]. In 2018, the General Data 
Protection Regulation of the EU came into force [45], focusing on protecting and empowering all EU 
citizens regarding data privacy and reshaping the way all organizations including governments approach 
data privacy. 
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The afore-mentioned regulations and studies have fully discussed risk management strategies associated 
with OGD, but they did not address the context in which a specific risk occurs, for example, the stage in 
the data lifecycle, and its source and consequence. Besides, a single country context may not be sufficient 
to reveal the variety of risks in more comprehensive institutional backgrounds. These defects may weaken 
the effectiveness of suggested strategies. In view of this, we have conducted a cross-case study in international 
context from the perspective of the lifecycle of OGD. 


2.3 Lifecycle Model of OGD 


Lifecycle model can guide the process of opening up data [46]. Lifecycle analysis draws on the biological 
analysis method, and divides the development process of objects (records, data, products, projects, 
organizations, etc.), from the stage of generation to that of extinction into several stages. Among others, 
OGD, is also subject to changes in its lifecycle. An analysis based on the lifecycle of OGD can help identify 
the risks that exist at different stages. Meanwhile, risk management itself can also be divided into three 
phases of latency, occurrence and crisis response [47], which necessitates different measures. 


According to [2], the OGD lifecycle comprises three sections, a pre-processing section (data creation, 
selection, harmonization and publishing), an exploitation section (data interlinking, discovery, exploration, 
and exploitation), and a maintenance section (data curation). Based on an investigation conducted in The 
Netherlands, [46] developed a community-driven open data lifecycle model, which comprises identification 
of data, data preparation, data issue, and data reuse and data evaluation. [48] discussed the barriers of 
OGD in China based on a data-centered lifecycle model, including data organization and processing, 
storage and distribution, discovery and acquisition, and appreciation and evaluation. 


With different research purposes, the above-mentioned OGD lifecycle models are either data management 
centered or value realization centered. As the risks of OGD are mainly taken by government departments, 
including both the data providers and users, a government centered lifecycle model is needed. A lifecycle 
model of OGD should consist of at least five stages: data creation and collection, data organization, data 
release, data utilization and data maintenance. 


2.4 Analytical Framework 


Based on the literature review above and the lifecycle of OGD, we develop an analytical framework for 
the subsequent case study, as shown in Figure 1. 
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Figure 1. Analytical framework for case study based on the open government data (OGD) lifecycle. 


3. RESEARCH DESIGN 


3.1 Research Procedure 


The present study adopts a cross-case study method. Although previous studies have discussed some 
OGD-related risks, they are neither integrated into a whole nor detailed from the perspective of the OGD 
lifecycle. To bridge these theoretical gaps, this study adopts the case study method. Compared with a single 
case study, a cross-case study is used to make generalization, and examine themes, similarities and 


differences across cases in quantitative or qualitative analysis. The research design involving multiple cases 


is generally regarded as more robust than that of a single case study, as it provides the observation and 
analysis of a phenomenon in several settings [49]. 


In this study, an analysis across three OGD cases from different countries is conducted. A content analysis 
method is also used to examine the lifecycle distributions of OGD-related risks. The research steps are as 
follows: 


~O U A UN 


Select three cases of OGD in the USA, the UK and China based on their theoretical potentials and 


representativeness in different stages of the OGD lifecycle. 


. Collect case data through the Internet search and semi-structured interviews; 
Identify the risks associated with OGD using within-case and cross-case analysis; 
. Categorize all identified risks into a taxonomy model; 

. Analyze the distributions of all risks over five stages of the OGD lifecycle; 


Suggest countermeasures for OGD risk management. 
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3.2 Case Selection 


Three cases are selected for their theoretical richness and representative ability to answer the research 
questions and address the different stages of OGD programs. They are the UK healthcare data program 
“care.data”, the American Internal Revenue Service (IRS) data breach event and the Open Data initiative 
of Shanghai, China. The “care.data” case is typical, in that it reveals the risks in data collection and sharing. 
The IRS data breach is not a typical OGD case, but it reveals the mismanagement and malicious use of 
government data; it is therefore representative in revealing the risks that are characteristic of the OGD 
maintenance phase. The case of the Shanghai OGD is of typical significance, as it reveals the risk concerns 
of government departments at the early stage of an OGD initiative. The three cases, which are both country- 
and industry-specific, focusing on different phases of the OGD lifecycle, can jointly support the identification 
of risks in the entire lifecycle. 


3.3 Data Collection 


Case data are collected via the Internet search and semi-structured interviews. The former applies to 
cases from the UK and the USA, and covers news reports and online commentaries; the latter apply to the 
case of China and cover three one-to-one in-depth interviews and one focuses on group interview with the 
heads of seven government departments. All the interviews lasted for nine hours in total, and 55,600 
Chinese words were transcribed within one week of the interviews. 


3.4 Data Analysis 


A bottom-up coding approach combined with a cross-case comparison is adopted to analyze the data. 
The data analysis is a continuous process, starting with data collection. First, all concepts or entities related 
to risks or real harms are identified; they are then compared and categorized within the case. Second, the 
results of three cases are listed, analyzed, compared and categorized. Third, the sources and consequences 
of all risks are analyzed. Fourth, a taxonomy model of OGD-related risks is constructed. Finally, the 
distribution of the risks over the five stages of the OGD lifecycle is analyzed and strategies to address them 
are suggested. 


4. CASE ANALYSIS 
4.1 Case 1: The Discontinuation of the care.data Program in UK 
4.1.1 Case Introduction 


In 2013, the long-established National Health Service (NHS) of the UK and the Health and Social Care 
Information Centre (HSCIC) initiated a program called care.data, aiming to improve the safety and care of 
patients by using information; the program also helped create an extensive health records database, whose 
target users include pharmacies, mental health services, opticians, dentists, and education and training 
institutions, and that will eventually support all healthcare facilities. The data have been anonymized and 
only cover the patient’s age range, gender and area of residence, but in exceptional circumstances, such 
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as during a pandemic, a researcher can apply to the Minister of Health for the removal of these privacy 
protections. Researchers believe the data will help them develop new treatments and evaluate NHS 
services. However, the “care.data” project, surprisingly, does not run smoothly. In February 2014, the NHS 
acknowledged a serious crisis of confidence regarding the “care.data” project, and informed family doctors 
to postpone the uploading of patient data for up to six months. In the fall of 2014, the NHS decided to 
endeavor four new pilots to collect medical health data for two million patients, but the first pilot was not 
officially launched until June 2015. Due to poor communication with the public, such as the absence of 
press conferences, and the failed delivery of brochures to families, the NHS’s data collection was collectively 
denounced by patients, doctors, the British Medical Association, the privacy campaign group Big Brother 
Watch and the Association of Medical Research Charities; consequently, one million people withdrew from 
the program. Under enormous pressure, the care.data was stopped by the NHS on July 6, 2016. 


4.1.2 Risk Identification of Case 1 


The NHS is an important source of population data. The care.data is a typical example of government 
data collection and utilization. Through the analysis of data collected from online news reports, blogs, 
comments, etc, the following risks are identified: 


1). Privacy leakage risk. Health data are of a personal and sensitive nature. Mudie, an opponent of care. 
data, said: “The human cost to the patient whose identity and medical history are made public is 
potentially disastrous. Careers could be ended, jobs lost, insurance refused and relationships 
destroyed if sensitive medical facts are made public or used by private firms, other people or, indeed, 
the media” [50]. Although the patient data collected by the care.data had been anonymized, data 
users typically used open data in conjunction with the holder’s closed data and accessible data, and 
the public were worried that malicious analyzers could use specific techniques to identify the 
patient's private information [51]. 

2). Risks from implicit operational specifications. A lack of explicit operational specifications has led to 
various difficulties in the opening and collection of data. The accusations from the public are: “the 
ambiguous standards for obtaining health data posing a risk of trust between doctors and patients 
[52]”, “the regulations over data access, data inspection and balance not yet established or 
implemented [53]”, etc. The NHS did not systematically organize and process the collected data 
sets, leading to perplexed usage and potential safety hazards when cooperating with other agencies. 

3). Risks of improper data use, especially in cooperation with commercial organizations. The focus of 
the public concern is that sharing sensitive medical information with commercial companies can be 
risky without the explicit consent of patients. The care.data hands over coded patient data to the 
insurance industry to help actuaries calculate the average premiums, but the public believe that it 
may identify an individual’s tendency to become ill, causing the insured to be biased against when 
attempting to buy insurance. In the cooperation between NHS and Google’s DeepMind, the public 
are also worried that commercial organizations will use patients’ private data for profit. “Any effort 
to ignore the use of data and only discuss open data policy will fail as they are faced with the 
disorder in reality and compromise in practice [53].” 
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4). Data quality risks. Data quality risks induced by data collection methods, such as incompleteness, 
distortion, and inaccuracies. For example, care.data requires general practitioners to collect patient 
data; however, as a result, data pertaining to young and healthy males may be missing. Some people 
obtain the prescription drugs but flush them down the toilet when unmonitored, causing distortions 
in the drug’s performance data, etc. 

5). Risks from immature techniques. One of the reasons for the public's opposition to care.data is the 
lack of technical sophistication. Sheila Bird, a professor of statistics of Strathclyde University, stated: 
“Data-sharing as proposed by care.data was disastrously incompetent — both ethically and technically. 
Professionals rebelled and prevailed in out-casting care.data, thereby ensuring that future proposals 
will not succeed unless both technically proficient and in the public interest” [54]. 

6). The risk of public trust crisis prompted by the immaturity of government regulation. Of all the 
objections against care.data, many could be regarded as a function of the public trust crisis caused 
by a flawed government supervision system. For example, “when you propose to share our most 
confidential medical records, ambiguous promises and fictitious regulatory frameworks are disturbing 
to the public” [50]. 

7). The risk of poor communication. Huge external pressures on care.data stemmed from lack of 
publicity and ineffective communication. While the project is valuable, it requires the understanding, 
support and cooperation from citizens and other organizations. In November 2014, the All Party 
Parliamentary Group in the British Parliament investigated the care.data project, accusing it of lacking 
transparency and poor publicity, which eventually led to its failure. 

8). The risk of unsustainable funding for public communication. Between October 2013 and 2014, the 
NHS announced £ 2 million to publicize the care.data project to the general public, but the actual 
cost of advocacy was merely £ 1 million. The survey found that less than one-third of the public 
received publicity brochures because of insufficient investment in public communication. This 
eventually led to the project being forced to cease due to tremendous external pressure. 

9). The risk of oversized external pressures. After the care.data program was announced in early 2013, 
some privacy organizations launched the Medical Data Confidentiality Initiative, calling attention to 
security risks in the use of medical data. Since then, these organizations have applied significant 
pressure to initiatives in the collection of medical data. 

10). The risk of unprofessional information governance. In January 2015, the NHS’s Independent 
Information Governance Oversight Panel released a report indicating that the initial commitment to 
the care.data project was not completed, and partly because of the lack of experts in information 
governance. 


4.2 Case 2: The IRS Data Breach in the USA 
4.2.1 Case Introduction 


In 2015, a data breach took place in the Internal Revenue Service (IRS) of the United States [55]. With 
the help of companies such as IBM, the IRS set up a complex data platform networked with other government 
agencies. The site has an application called “Get Transcript” that allows citizens to easily access to previous 
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tax records. In 2015, hackers illegally accessed about 724,000 taxpayers’ tax returns via the Get Transcript 
app. This malicious behavior was not detected until three months later. After the investigation, J. Russell 
George, the Treasury Inspector General for the tax administration, accused the IRS of not deploying the 
Web systems according to the requirements and recommendations, which caused serious data breaches in 
the event of reduced staffing and increased capital input [56]. 


4.2.2 Risk Identification of Case 2 


1). The risk of hacking. Hacking is a constant threat to government applications, and its immediate 
consequences are leakages of privacy, trade secrets or even the security information of national 
infrastructure. 

2). The risk of poorly implemented government regulations. After the implementation of the My Data 
project, the US Open Data Action Plan established clear requirements regarding network and data 
security systems for government departments, but the IRS did not follow these requirements seriously, 
resulting in a colossal security breach. 

3). The risk of poor communication and cooperation between departments. After the data breach event, 
the IRS, the Treasury Department and other departments were at odds with each other. The poor 
communication and cooperation between them resulted in an inadequate response to the risk. 

4). The risk of delayed system updates and maintenance. The maintenance and update of an open 
system platform must be carried out periodically; otherwise, security risks are likely to occur. In this 
case, some IRS applications were outdated and had many security vulnerabilities; this was coupled 
with poor maintenance of the systems and platforms. Together, these factors were vulnerable to 
hacking, which remained undetected for three months. 

5). The risk of unsustainability in capital investment. According to the IRS, an increasing number of 
government information systems security services are being outsourced; with the delay of at least $ 
400 million in investments due to IT budget cuts, many of the operations, including the maintenance 
and replacement of old IT systems, are affected, thereby resulting in system failure and security 
loopholes. 


4.3 Case 3: The Tardy Progress of the OGD Program in China 


During the short history of OGD in China, no major crisis has taken place yet. Therefore it is impossible 
to have an event as a case study that exhibits the lifecycle. Instead, we only investigate the relevant risk 
factors by examining a representative city government. Between December 2016 and May 2017, we 
interviewed seven government departments in Shanghai, including the Human Resources and Social 
Security Bureau, the Agriculture Commission, the Food and Drug Administration, the Audit Bureau, the 
Health and Family Planning Commission, the Trade and Industry Bureau, and the Planning and Land 
Resources Administration. According to the analysis of interview data, the risk concerns of the departments 
related to OGD can be summarized into the following six aspects: 
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1). Risk of data distortion. Government statistical data, though eye-catching and frequently used by the 
public, are prone to distortion in the layer-by-layer hierarchical reporting system; this has become a 
systemic problem. One interviewee said: “It is not that our attitude to OGD is inactive, but that the 
authenticity of data reported by the subordinate departments cannot be guaranteed”. 

2). Risk of low data quality. Given that some government departments’ data are collected from enterprises 
or reported by subordinate departments and other diversified channels, some data are thus flawed 
with inconsistent format, unstandardized metadata, incompleteness and other quality problems, 
which forestalls some departments from engaging in OGD pilots. One respondent noted: “The data 
collected directly from medical institutions and health administration bureaus at county or district 
level were found countless quality problems. We’ve been doing data cleaning for two-three 
consecutive years, so a lot of data have not yet been released.” 

3). Risk of data aggregation. Although open data have been anonymized and desensitized, some 
departments are still worried about the existence of security risks for collective data release. One 
interviewee said: “The bulk of our data are disease information or disease prevention information, 
if released, it implicates personal privacy.” 

4). Risk of undefined operational norms. Some departments expressed that due to the lack of technical 
standards and codes of practice, they are quite at sea for some problems in open data. One 
interviewee suggested that “the scope of disclosure also needs to be stipulated. We are willing to 
participate in the construction and also want to provide useful data, but the specification of the 
application needs to be well defined.” 

5). Risk of imperfect mechanisms. The lack of supervision, feedback and incentive mechanism led to 
the slow progress of the OGD project. On account of the scanty feedback on the use of open data, 
the enthusiasm of some departments for OGD has been dampened. One interviewee said: “We are 
in want of a mechanism for feedback and supervision, we would like to know where and to what 
extent the data we provide have realized their value. All these require feedback, which is also a 
positive incentive for us.” 

6). Risk of data value-sparseness. Some departments want to know if their data are really useful to the 
public. “Every year we provide a lot of open data to the community (for social services), and a great 
deal of work has been done, but | am still a stranger to the data they (neighborhoods) are using and 
how they use them.” “We'd love to know who is using the data.” 


5. RISK INTEGRATION AND CLASSIFICATION 


Through iterative comparison and merging, the risks identified from the three cases are integrated into 
14 types of risks: content legitimacy, data quality, data value, data management, platform support, information 
security, organizational support, resource input, institutional support, business process, use, imperfect 
regulations and standards, scarce external resources and external pressures. The 14 types of risks are then 
further categorized based on their sources and consequences, respectively. 


318 Data Intelligence 


202211.00467v1 


chinaXiv 


ChinaXiva {ERAT 
Building a Holistic Taxonomy Model for OGD-Related Risks: Based on a Lifecycle Analysis 


5.1 Classification Based on the Risk Source 


According to the source, the 14 types of risks are categorized into five classes: data (related) risks, 
technology (related) risks, management (related) risks, utilization (related) risks and external environment 
(related) risks. 


5.1.1 Data Risks 


Most of the data published on OGD websites are submitted by various government departments, while 
a small part of them are created by the OGD department itself. These data vary in their attributes accordingly 
with the clear-cut department functions. Some attributes may lead to uncertain consequences if the data 
are open. For instance, some data concern citizens’ privacy, and their inappropriate disclosure may bring 
harm to the individuals involved. These risks arising from data attributes are categorized as data risk, which 
is mainly related to the legitimacy of the open content, the data quality, and the data value. 


The risk in the legitimacy of open content refers to the possibility that the open content does not comply 
with laws, regulations, or other social norms. The risk of low data quality includes the possible distortion, 
error, obsolescence, or incompleteness of data content, format chaos, absent links or bad metadata. This 
kind of risks may bring inestimable losses to data users or damage the government's credibility. The main 
problems lie in the phase of data collection and pre-processing. Some desensitized data also suffer from 
quality degradation after de-identification. The risk of low data value refers to the possible investment loss 
of resources due to opening data that have trivial value, have no clear users, or are rarely used. 


5.1.2 Technology Risks 


Technology risks result from inadequate technical capacity and tardy understanding of new technologies, 
such as data management, platform support and information security. Technology problems in data 
management may lead to unsatisfactory effects of OGD, such as the use of outdated techniques for data 
collection, improper methods for data cataloguing, unstandardized metadata and improper data formats 
that encumber analysis and utilization. The low capacity and delayed updates of the OGD platform bring 
disastrous consequences, as evinced in case 2. Because of the loopholes in the information security 
technologies of OGD platforms, worrisome consequences, such as privacy leakage, data tampering and 
corruption, platform damage or falsely authorized certification, may occur. 


5.1.3 Management Risks 


Management risks are caused by problems that exist in OGD-related organizational structures, business 
processes, management styles, or mechanism designs; they comprise the following four types: organizational 
risks, resource input risks, institutional risks and business process risks. The organizational problems may 
exist in the lack of special position setting, the unbalanced allocation of power and responsibility between 
the data provider and receiver, poor inter-departmental communication and cooperation, weak OGD 
promotion strategies, poor management of business outsourcing, etc. Insufficient investment in talent or 


Data Intelligence 319 


202211.00467v1 


chinaXiv 


ChinaXiva {ERAT 
Building a Holistic Taxonomy Model for OGD-Related Risks: Based on a Lifecycle Analysis 


funds in OGD-related businesses may directly hinder the promotion of OGD. Incomplete management 
rules, business guidelines, incentive mechanisms or ill-conceived business processes upon OGD may also 
bring unwanted consequences. 


5.1.4 Utilization Risks 


Utilization risks arise from misuse, malicious or improper use, or insufficient use of the open data, 
which may result from insufficient literacy or capacity of OGD users. This may result in erroneous decision 
making, invasion of privacy, or insufficient exploitation of data value. 


5.1.5 Environment Risks 


Environment risk refers to a harmful impact upon the process, mode, or outcome of an OGD program 
due to the limitations of current institutions, resources, or public expectations, including: (1) Risk of 
imperfect regulations and standards. Some obstacles to OGD may be attributable to imperfect laws or 
regulations, inoperable technical standards, or rigid government administration systems. (2) Risk of scarce 
external resources, e.g., the lack of data governance experts, insufficient talent supply, or immature 
knowledge of OGD. (3) Risk of excessive external pressure. As shown in case 1, online negative opinions 
of pressure groups exerted significant pressure on the initiative of OGD. 


5.2 Classification Based on the Risk Consequence 


Judging from the consequences, the afore-mentioned 14 subtypes of risks can be divided into two classes: 
risks to OGD and risks from OGD. The risks to OGD may hinder the smooth running of an OGD program 
but will not undermine the legitimacy of OGD itself. These risks comprise10 sub-categories: risks in data 
management, information security, platform support, organizational adjustment, resource investment, 
institutional provision, business process, imperfect regulations and standards, scare external resources, and 
excessive external pressures. The risks from OGD refers to the possible negative effects caused by OGD, 
which may lead to challenges to the legitimacy of the OGD, including illegal contents, low data value, 
poor data quality and improper data utilization. The risks from OGD may lead to a sceptical attitude to 
OGD initiative, while the risks to OGD may influence the success of OGD program. The fear of both may 
become the actual barriers to OGD. 


5.3 A Holistic Taxonomy Model of Risks Associated with OGD 


Based on the analysis above, a holistic taxonomy model of OGD-related risks is constructed, as shown 
in Figure 2. First, all risks identified from the three cases are listed together and compared. Those that are 
similar are clustered and categorized into an upper class according to their sources and consequences, 
respectively. Next, the correspondence between the source-based and consequence-based classifications 
is analyzed. Finally, a holistic taxonomy model of OGD-related risks is constructed. This model can help 
government departments and the public better understand OGD-related risks and thus devise more efficient 
response strategies. 
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Figure 2. A taxonomy model of open government data (OGD)-related risks. 


6. RISK DISTRIBUTION OVER THE LIFECYCLE OF OGD 
6.1 Distribution of Risks from Five Sources 


Using a content analysis for the qualitative data of the three cases, the lifecycle distributions of risks from 
five sources are revealed, as shown in Table 1. The frequency of each type of risk occurring at every stage 
of the OGD lifecycle is calculated. At different stages, the distributions of the 14 types of risks are shown 
in Figure 3. Among the five stages, the data collection stage is risk intensive. This implies that an appropriate 
risk management plan should be made before the OGD program begins. 


Data Intelligence 321 


202211.00467v1 


chinaXiv 


Chinaxiva(ERATY 
Building a Holistic Taxonomy Model for OGD-Related Risks: Based on a Lifecycle Analysis 


Table 1. The lifecycle distribution of open government data (OGD)-related risks. 
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Risks Data risks Content legitimacy a v v 3 
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Risks to Technology risks Data management v v 4 v v 5 
OGD Platform support v v v v v 5 
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Management risks Organizational v v v v 4 
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Resource input v v v v v 5 
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Business process v v v v 4 
Environment risks Imperfect regulations v v v v v 5 
and standards 
Scarce external v v v 3 
resources 
External pressures v v y 3 
Sum 13 11 11 10 11 56 
External pressures MEM = =—CS~—S 
Scarce external resources MMM oae) 
Imperfect regulations and standards EE TT TT 
Business process as =] 
Institutional support 
Resource input a as 
Organizational support ET E 
Information security Et 
Platform support ee 
Data management a 
Use bs] 
Data value ES 
Data quality m SSS 
Content ligitinacy 


=Collection #Organization ®Release ®Utilization = Maintenance 


Figure 3. The lifecycle distribution of 14 types of open government data (OGD)-related risks. 
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6.2 The Occurrence Stages of 14 Types of Risks 


A radar chart is made to show the number of the occurrence stages of 14 risks. As shown in Figure 4, 
the risks that may occur at all five stages include those related to data value, data management, platform 
support, information security, resource input, institutional risk, and flawed regulations and standards. This 
indicates the importance of a strong technical platform and a sound management system for the success 
of the OGD program. The risks that may occur at four stages include organizational and business process 
risk, which still imply the importance of OGD project management. The risks that may occur at three stages 
are those related to content legitimacy, data quality, scarce external sources and external pressures. 


Among the risks that occur at all five stages, only the risk related to data value is a risk from OGD. This 
means that, except their natural characteristics, the value of the open data should be improved at every 
stage of the OGD lifecycle. The other three types of risks from OGD emerge at three or fewer stages. The 
risk of data use only occurs at the utilization stage. This implies that a specific type of risk that may be 
brought by OGD should be controlled or avoided at a few specific stages. Most of the risks to OGD could 
be mitigated by improving project management and a few of them could be avoided by strengthening 
regulations and technical standards, e.g., the enforcement of EU GDPR. 
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Figure 4. The number of the occurrence stages of 14 types of risks. 
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7. RISK MANAGEMENT STRATEGIES FOR OGD PROJECTS 


Risks from OGD and risks to OGD have different sources and consequences and are distributed at 
different stages in the OGD lifecycle. Strategies in response to them are suggested as follows: 


7.1 Response Measures for Risks from OGD 
7.1.1 Data Governance Strategy for the Risk in Content Legitimacy 


The legitimacy of the open content is the first concern of many government departments, and has even 
become the default explanation for any deficiency in data openness. In response to this kind of risk, the 
government must accelerate the enactment of related laws and regulations (e.g., privacy protection act, and 
data security act), establish examination criteria for privacy-involved data and build an effective data 
governance system to ensure the legitimacy of open contents. In May 2018, the GDPR was enforced by 
the EU, empower people with more control over their personal data collected by all companies operating 
in the EU [45]. In the Nanhai district of Foshan city in Guangdong province, China, a data governance 
committee was established to determine the legitimacy of open data [30]. 


7.1.2 Value Appraisal and Feedback Mechanism for the Risk in Data Value 


Among the departments that are inactive to data openness, some are unconvinced of the value of the 
data to be released. To ensure the value of data that have yet to be disclosed, it is necessary to establish a 
mechanism for data value appraisal mechanism. Further, a feedback mechanism should be set up to inform 
the data providing departments about the results of data utilization, thus giving them sufficient information 
to determine the priorities for future releases. 


7.1.3 Quality Management and Data Provenance Strategy for the Risk in Data Quality 


In response to the risk in data quality, guidelines and evaluation standards for stable data formats and 
metadata, secure platform and data provenance are needed to ensure the authenticity, integrity and usability 
of data at each stage of the lifecycle of an OGD program. For instance, the OPEN Government Data Act 
of the USA requires federal agencies to publish their information online based on an underlying open 
standard that is maintained by a standard organization [57]. 


7.1.4 Legislation Strategy for the Risk in Data Utilization 


In response to data utilization risks, it is necessary to clearly stipulate the purpose, means, scope, and 
results of data utilization by speeding up the legislation, and mete out legal penalties for the malicious use 
of data. 
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7.2 Improving Risk Management of OGD Projects 
7.2.1 Through the Lifecycle of the Risk Itself 


During the different periods of the risk lifecycle, different strategies are needed to warn, evaluate and 
respond appropriately. [58] proposed the establishment of a data risk warning mechanism, an internal 
control mechanism and the fostering of risk coping ability. (1) During the period of latent risks, government 
departments must actively analyze and identify potential risks, make efforts to avoid and transfer them, and 
establish reserved countermeasures for use in times of crisis. In this phase, although the government of 
Shanghai had foreseen possible adverse consequences at the early stages of their OGD program, they were 
not able to formulate a complete risk response plan because they lacked an understanding of OGD-related 
risks. (2) In the risk occurrence phase, it is necessary to initiate the emergency plan immediately and take 
effective measures to minimize losses as much as possible, including timely cut-loss, communication 
enhancement and winning public understanding. Due to the failure of taking measures in time to pledge 
public understanding since 2 million publicity funds were not invested as originally planned, the care.data 
initiative of UK had to be stopped under public pressures [50]. (3) After the risk occurs, an emergency plan 
should be initiated immediately to minimize any adverse effects. Due to its limited capacity for crisis 
management and poor inter-departmental communication, the IRS failed to spot and solve the problem in 
time after the hacker invasion, and allowed the data-theft to last undetected for three months [59]. 


7.2.2 Through the Lifecycle of an OGD Project 
At all stages of the lifecycle of an OGD project, it is necessary to implement corresponding risk control 
strategies. 


1). At the stage of data collection, a complete OGD plan is needed. A detailed handbook should be 
developed to set the data open scope, conditions and technical means, especially the security level 
of data involving secrets. 

2). At the data organization stage, a set of operable technique standards is needed so as to ensure 
trustworthy data quality, such as metadata, data preprocess, data catalogue, and data documentation. 

3). At the stage of data release and utilization, detailed regulations should be made to ensure proper 
data use. Besides, emergency plans should be formulated in response to possible privacy leakage, 
malicious use, hacker and virus and weak publicity. 

4). At the maintenance stage, plans should be made to prevent unsustainable resource investment to 
ensure the smooth progress of the project. 


8. CONCLUSION AND DISCUSSION 


Risk aversion is a potential factor of resistance to OGD, which stems largely from the lack of understanding 
rather than their uncontrollability. The construction of a taxonomy model of risks can help government 
departments understand them better, and thus avoid conservative inaction. By conducting a cross-case 
analysis on three cases of OGD in the UK, the USA, and China, this study identified 14 types of risks 
associated with OGD. According to the risk source, the 14 types of risks are classified into five classes: 
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data, technology, management, utilization, and environmental risks. According to the risk consequence, 
the 14 types of risks are classified into two groups: risks to OGD and risks from OGD. A holistic taxonomy 
model of OGD-related risks is then constructed. Based on the model, the distribution of each type of risk 
across five stages of the OGD lifecycle is analyzed. It is found that the stage of data collection is risk- 
intensive and seven types of risks may emerge at all five stages. It is also found that most of the risks to 
OGD could be avoided or mitigated by improving risk management throughout the lifecycle of an OGD 
project. In response to the risks from OGD, it is necessary to improve data governance, data value appraisal 
and feedback, quality management, and data provenance and speed up legislation. 


Among the 14 types of identified risks, some have been discussed in previous studies, such as privacy 
leakage [60, 61], low data quality [17, 62], and improper use [63], but beyond these, this study also 
identifies several new risks associated with OGD, including those concerning content legitimacy, data 
value, weak data management, excessive external pressures, institutional flaws, and external resource 
scarcity. Furthermore, this study also makes the following contributions to this field: firstly, it distinguishes 
the risks to OGD from the risks from OGD, and thus differentiates the legitimacy of OGD from its smooth 
realization. This has a theoretical implications on deepening the understanding of OGD-related risks and 
supporting the rationality of an OGD initiative. Secondly, this study identifies OGD-related risks at different 
stages of the lifecycle of an OGD program from three cases in different countries and builds a holistic 
taxonomy model after clustering them. This fills the gap left by the deficiencies of previous studies that 
focused on a single country and presented the risks fragmentarily. Third, this study analyzes the distribution 
of each type of risk over five stages of the lifecycle of OGD and suggests specific strategies in response to 
them. This has practical implications for government departments that are promoting OGD. 


The limitations of this study include: (1) Due to the authors’ geographical limitations, the data for the 
cases of the UK and the USA are mainly collected from online sources. Although Google renders a 
comprehensive search, the absence of interviews with related government officials has prevented the 
researchers from directly assessing the attitudes of government staff on this point. To make up for this 
limitation, several rounds of in-depth interviews were conducted in the case of Shanghai. (2) The taxonomy 
model of OGD-related risks is built with a bottom-up induction approach from three cases. Although it can 
deepen the understanding of the OGD-related risks, it might have not involved all the potential risks that 
face other government departments. In the future, we will extend our approach to improve the taxonomy 
model of OGD-related risks with more cases and to empirically test the correlation between risk management 
and OGD performance. 
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